-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add runner to wait until snapshot has been created #1047
Add runner to wait until snapshot has been created #1047
Conversation
For snapshot creation (and metrics) so far we've relied only the `create-snapshot` runner with a blocking call. In this commit we are introducing a new runner `wait-for-snapshot-create`to complement `create-snapshot`. This is similar to `restore-snapshot` and `wait-for-recovery` and helps in cases where network connections maybe terminated making a blocking call unsuitable.
065beaf
to
293922e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I left a few comments.
docs/track.rst
Outdated
@@ -1119,7 +1119,24 @@ With the operation ``create-snapshot`` you can `create a snapshot <https://www.e | |||
* ``request-params`` (optional): A structure containing HTTP request parameters. | |||
|
|||
.. note:: | |||
When ``wait-for-completion`` is set to ``true`` Rally will report the achieved throughput in byte/s. | |||
It's not recommend to rely on ``wait-for-completion=true``. Instead you should keep the default value (``False``) and use an additional ``wait-for-snapshot-create`` operation in the next step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "not recommend" -> "not recommended"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 32ee4ab
docs/track.rst
Outdated
@@ -1119,7 +1119,24 @@ With the operation ``create-snapshot`` you can `create a snapshot <https://www.e | |||
* ``request-params`` (optional): A structure containing HTTP request parameters. | |||
|
|||
.. note:: | |||
When ``wait-for-completion`` is set to ``true`` Rally will report the achieved throughput in byte/s. | |||
It's not recommend to rely on ``wait-for-completion=true``. Instead you should keep the default value (``False``) and use an additional ``wait-for-snapshot-create`` operation in the next step. | |||
This is mandatory on the `Elastic Cloud <https://www.elastic.co/cloud>`_ or environments where Elasticsearch is sitting behind a network element that may terminate the blocking connection after a timeout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- on the Elastic Cloud -> on Elastic Cloud?
- "sitting behind a network element" -> "connected via intermediate network components, such as proxies, that may terminate ..."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 32ee4ab
esrally/driver/runner.py
Outdated
# Possible states: | ||
# https://www.elastic.co/guide/en/elasticsearch/reference/current/get-snapshot-status-api.html#get-snapshot-status-api-response-body | ||
if response_state == "FAILED": | ||
self.logger.error("Snapshot [%s] failed. Response status:\n%s", snapshot, json.dumps(response)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "Response status" -> "Response"? (as it is the actual full, response). I also think we should pretty-print it by specifying e.g.
indent=2
injson.dumps
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 32ee4ab
esrally/driver/runner.py
Outdated
# https://www.elastic.co/guide/en/elasticsearch/reference/current/get-snapshot-status-api.html#get-snapshot-status-api-response-body | ||
if response_state == "FAILED": | ||
self.logger.error("Snapshot [%s] failed. Response status:\n%s", snapshot, json.dumps(response)) | ||
raise exceptions.RallyAssertionError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Any reason why it is formatted this way? I tried putting everything on the same line and ended up with a line width of 110 which should still be fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 32ee4ab
esrally/track/track.py
Outdated
@@ -420,7 +420,7 @@ class OperationType(Enum): | |||
RawRequest = 5 | |||
WaitForRecovery = 6 | |||
CreateSnapshot = 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could move CreateSnapshot
again to administrative actions (i.e. anything > 1000) because Rally would not report request metrics by default for administrative operations (it's usually not interesting to know).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 32ee4ab
@@ -2756,6 +2757,140 @@ async def test_create_snapshot_wait_for_completion(self, es): | |||
} | |||
}) | |||
|
|||
params = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I cannot comment at the specific line but in test_create_snapshot_no_wait
we still mock es.snapshot.status
and assert later on that it is not called. IMHO we should remove this now because there is no chance we'd ever call it anymore in the new runner implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/driver/runner_test.py
Outdated
|
||
r = runner.WaitForSnapshotCreate() | ||
|
||
logger = logging.getLogger("esrally.driver.runner") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we just use the runner's logger with logger = r.logger
(you can then probably just inline it in the with
statement below? This would also make it more refactoring safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 32ee4ab
docs/track.rst
Outdated
* ``snapshot`` (mandatory): The name of the snapshot that this operation will wait until it succeeds. | ||
* ``completion-recheck-wait-period`` (optional, defaults to 1 second): Time in seconds to wait in between consecutive attempts. | ||
|
||
Rally will report the achieved throughput in byte/s, the duration in seconds, the start and stop time in milliseconds and the total amount of files snapshotted as returned by the the `Elasticsearch snapshot status API call <https://www.elastic.co/guide/en/elasticsearch/reference/current/get-snapshot-status-api.html>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I'd document all the metadata that we return (except for the throughput)? Otherwise we should probably also document the respective metric keys and make clear that these metadata are only available with an Elasticsearch metrics store (contrary to the throughput).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 32ee4ab
Thanks for your comments! Could you PTAL? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating. LGTM!
For snapshot creation (and metrics) so far we've relied only the
create-snapshot
runner with a blocking call.In this commit we are introducing a new runner
wait-for-snapshot-create
to complementcreate-snapshot
. This issimilar to
restore-snapshot
andwait-for-recovery
and helps in caseswhere network connections maybe terminated making a blocking call
unsuitable.